TR-2009001: BISC: A Binary Itemset Support Counting Approach towards Efficient Frequent Itemset Mining

نویسندگان

  • Jinlin Chen
  • Keli Xiao
چکیده

the performance of a depth-first Frequent Itemset Miming (FIM) algorithm is closely related to the total number of recursions which can be modeled as O(n), where k is the maximal recursion depth and n is the branching factor. Many existing approaches focus more on improving support counting rather than on decreasing n and k, which may lead to unsatisfactory performance as they grow. In this paper a novel approach, Binary Itemset Support Counting (BISC), is presented to address these two factors. Let the direct support of an itemset I be the number of transactions with the same itemset as I, BISC can derive the supports of all the itemsets in a database by iteratively updating their direct supports, thus eliminating the need for further recursion. BISC converts a database into its binary representation and combines one-stage BISC and two-stage BISC to minimize the cost of support updating and memory consumption by eliminating redundant updating operations. By applying BISC with the basic projection technique, our approach can significantly decrease the maximum depth and branching factor of database projection, thus improving both the time and space efficiency for FIM. In terms of time efficiency, experiments show that BISC outperforms all the other algorithms (in many cases by almost an order of magnitude or more) in the datasets tested. Even though this does not guarantee that BISC will always perform the best, the result is impressive given the fact that most existing algorithms are only efficient in some types of datasets. The memory usage of BISC is comparable to (in most cases smaller than) those of the other algorithms. In summary, the concepts of direct support, binary representation, multi-stage BISC, and the optimization strategies applied in BISC represent a promising approach to related areas. 1 The software for this algorithm is available at http://alpha.cs.qc.edu/research.html

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

BISC: a Binary Itemset Support Counting Approach towards Efficient Frequent Itemset Mining

the performance of a depth-first Frequent Itemset Miming (FIM) algorithm is closely related to the total number of recursions which can be modeled as O(n), where k is the maximal recursion depth and n is the branching factor. Many existing approaches focus more on improving support counting rather than on decreasing n and k, which may lead to unsatisfactory performance as they grow. In this pap...

متن کامل

Fast Algorithms for Mining Interesting Frequent Itemsets without Minimum Support

Real world datasets are sparse, dirty and contain hundreds of items. In such situations, discovering interesting rules (results) using traditional frequent itemset mining approach by specifying a user defined input support threshold is not appropriate. Since without any domain knowledge, setting support threshold small or large can output nothing or a large number of redundant uninteresting res...

متن کامل

Concurrent Processing of Frequent Itemset Queries Using FP-Growth Algorithm

Discovery of frequent itemsets is a very important data mining problem with numerous applications. Frequent itemset mining is often regarded as advanced querying where a user specifies the source dataset and pattern constraints using a given constraint model. A significant amount of research on frequent itemset mining has been done so far, focusing mainly on developing faster complete mining al...

متن کامل

Mining Frequent Sequences Using Itemset-Based Extension

In this paper, we systematically explore an itemset-based extension approach for generating candidate sequence which contributes to a better and more straightforward search space traversal performance than traditional item-based extension approach. Based on this candidate generation approach, we present FINDER, a novel algorithm for discovering the set of all frequent sequences. FINDER is compo...

متن کامل

Ramp: High Performance Frequent Itemset Mining with Efficient Bit-Vector Projection Technique

Mining frequent itemset using bit-vector representation approach is very efficient for small dense datasets, but highly inefficient for sparse datasets due to lack of any efficient bit-vector projection technique. In this paper we present a novel efficient bit-vector projection technique, for sparse and dense datasets. We also present a new frequent itemset mining algorithm Ramp (Real Algorithm...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016